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Abstract The purpose of this paper is to estimate the intensity of a Poisson process ./V by using 
thresholding rules. In this paper, the intensity, defined as the derivative of the mean measure of N 
with respect to ndx where n is a fixed parameter, is assumed to be non-compactly supported. The 
estimator f n ^ based on random thresholds is proved to achieve the same performance as the oracle 
estimator up to a possible logarithmic term. Then, minimax properties of f nn on Besov spaces 
Bp q are established. Under mild assumptions, we prove that 
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and the lower bound of the minimax risk for Bp^nl^oo coincides with the previous upper bound up to 
the logarithmic term. This new result has two consequences. First, it establishes that the minimax 
rate of Besov spaces Bp q with p < 2 when non compactly supported functions are considered is 
the same as for compactly supported functions up to a logarithmic term. When p > 2, the rate 
i i ' exponent, which depends on p, deteriorates when p increases, which means that the support plays 
a harmful role in this case. Furthermore, f n ~ is adaptive minimax up to a logarithmic term. 
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q ; 1 Introduction 

.!_h \ The goal of the present paper is to derive a data-driven thresholding method to estimate the 
^ ■ intensity of a Poisson process on the real line. 

Poisson processes have been used for years to model a wide variety of situations, and in particular 



data whose maximal size is a priori unknown. For instance, in finance, Merton 29f] introduces 
Poisson processes to model stock-price changes of extraordinary magnitude. In geology, Uhler and 
Bradley |32| use Poisson processes to model the occurrences of petroleum reservoirs whose size is 
highly inhomogeneous. Actually, if we only focus on the size of the jumps in Merton's model or 
on the sizes of individual oil reservoirs, these models consist in an inhomogeneous Poisson process 
with heavy-tailed intensities (see [3] for a precise formalism for the financial example). So, our 
goal is to provide data-driven estimation of a Poisson intensity with as few support assumptions as 
possible. 
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Of course, many adaptive methods have been proposed to deal with Poisson intensity esti- 
mation. For instance, Rudemo 3l[ studied data-driven histogram and kernel estimates based on 
the cross-validation method. Donoho [IB] fitted the universal thresholding procedure proposed by 
Donoho and Johnstone 13] by using the Anscombe's transform. Kolaczyk [28|] refined this idea by 
investigating the tails of the distribution of the noisy wavelet coefficients of the intensity. For a 
particular inverse problem, Cavalier and Koo [l3] first derived optimal estimates in the minimax 
setting. More precisely, for their tomographic problem, Cavalier and Koo [ic| pointed out minimax 
thresholding rules on Besov balls. By using model selection, other optimal estimators have been 



proposed by Reynaud-Bouret [30( or Willet and Nowak [331 ] . 

To derive sharp theoretical results, these methods need to assume that the intensity has a known 
bounded support and belongs to L^. Model selection may allow to remove the assumption on the 
support. See oracle results established by [l^ ] who nevertheless assumes that the intensity belongs 
to Lpo. We have to mention that the model selection methodology proposed by Baraud and Birge 
[3] , 0] is "assumption-free" as well. However, as explained by Birge [3j , it is too computationally 
intensive to be implemented. Besides, in [3], 0] and 19], minimax performances on classical 
functional spaces are derived only for compactly supported signals. 

In the present paper, to estimate the intensity of a Poisson process, we propose an easily 
implementable thresholding rule specified in the next section. This procedure is near optimal 
under oracle and minimax points of view. We do not assume that the support of the intensity is 
known or even finite and most of the time, the signal to estimate may be unbounded. 



1.1 The thresholding procedure and main result 

In the sequel, we consider a Poisson process on the real line, denoted N, whose mean measure \i 
is finite and absolutely continuous with respect to the Lebesgue measure (see Section |2~T1 where we 
recall classical facts on Poisson processes). Given n a positive integer, we introduce / G Li(R) the 
intensity of N as 

m = 

ndx 

Since / belongs to Lj(R), the total number of points of the process N, denoted N^, satisfies 
E(./Vr) = n\\f\\i and JVr < oo almost surely. In the sequel, / will be held fixed and n will go to 
+oo. The introduction of n could seem artificial, but it allows to present our asymptotic theoretical 
results in a meaningful way. In addition, our framework is equivalent to the observation of a n- 
sample of a Poisson process with common intensity / with respect to the Lebesgue measure. Since 
TV is a random countable set of points, we denote by dN the discrete random measure ^tgn^t- 
Hence we have for any compactly supported function g, J g(x)dN x = YIteN SCO- Now, our goal 
is to estimate / by using the realizations of N. 

For this purpose, we assume that / belongs to L2QR) and we use the decomposition of / on one 
of the biorthogonal wavelet bases described in Section T2.2I We recall that, as classical orthonormal 
wavelet bases, biorthogonal wavelet bases are generated by dilations and translations of father and 
mother wavelets. But considering biorthogonal wavelets allows to distinguish, if necessary, wavelets 
for analysis (that are piecewise constant functions in this paper) and wavelets for reconstruction 
with a prescribed number of continuous derivatives. Then, the decomposition of / on a biorthogonal 
wavelet basis takes the following form: 

/ = ^Z a k4>k + ^2^2(3j,ki>j,k, (it) 
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where for any j > and any k £ Z, 

Ofc = / f(x)4> k (x)dx, (3 jtk = / f(x)if)j jk (x)dx. 

JR JR 

See Section f2.2l for further details. To shorten mathematical expressions, we set 

A = {X = {j,k): j>-l,fceZ} 

and for any A € A, ^ = 4 (respectively c/3a = ^fc) if A = (— l,fc) and 9?^ = V^./c (respectively 
V?A = ^j,jfe) if A = (j, k) with j > 0. Similarly, j3\ = a k if A = (-1, k) and (3\ = (3j >k if A = (j, k) 
with j > 0. Now, (jl.ip can be rewritten as 



In particular, (|1.2p holds for the Haar basis where in this case (p\ = <p\. Now, let us define the 
thresholding estimate of / by using the properties of Poisson processes. First, we introduce for any 
A G A, the natural estimator of j3\ defined by 




AeA 



(1.2) 




(1.3) 



that satisfies E(f3\) = (3\. Then, given some parameter 7 > 0, we define the threshold 




(1.4) 



with 




where 




Note that V\ >n satisfies E(Vx,n) = V\ tn , where 




Finally given some subset T n of A of the form 



r n = {A = (j, k) € A : j< j } , 



where jo = jo(n) is an integer, we set for any AeA, 



/3a = Pxl 



{\Px\>rix,-y} 



!{Aer n } 



and we set = (/?a)agA- Finally, the estimator of / is 




(1.5) 



AeA 
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and only depends on the choice of 7 and jo fixed later. When the Haar basis is used, the estimate 
is denoted and its wavelet coefficients are denoted (3 H = (/?^)apa - Thresholding procedures 



have been introduced by Donoho and Johnstone 17j . The main idea of 17J is that it is sufficient to 



keep a small amount of the coefficients to have a good estimation of the function /. The threshold 
?7a,7 seems to be defined in a rather complicated manner but is in fact inspired by the universal 



threshold proposed by 13] in the Gaussian regression framework. The universal threshold of 17] 



is defined by rj¥ = \/2o~ 2 logn, where a 2 (assumed to be known) is the variance of each noisy 

wavelet coefficient. In our set-up V\. n = Var(/3^) depends on /, so it is estimated by Remark 
that for fixed A, when there exists a constant Co > such that f(x) > cq for x in the support of 
(fx and if 1 1 V^a 1 1 §0 = °n( n (log n)™ 1 ), the deterministic term of (|1.4j) is negligible with respect to the 
random one and we have asymptotically 



Vx,j ~ \j 27^A,nlog n, 



which looks like the universal threshold expression if 7 is close to 1. Actually, the deterministic 
term of (jl.4l) allows to consider 7 close to 1 and to control large deviations terms for high resolution 
levels. In the same spirit, V\ iU is slightly overestimated and we consider V\ iU instead of V\. n to 
define the threshold. 



The performance of universal thresholding by using the oracle point of view is studied in 171 ] . 
In the context of wavelet function estimation by thresholding, the oracle does not tell us the true 
function, but tells us the coefficients that have to be kept. This "estimator" obtained with the aid 
of an oracle is not a true estimator, of course, since it depends on /. But it represents an ideal 
for the particular estimation method. The goal of the oracle approach is to derive true estimators 
which can essentially "mimic" the performance of the "oracle estimator" . For Gaussian regression, 



171 ] proved that universal thresholding leads to an estimator that satisfies an oracle inequality: 
more precisely, the risk of the universal thresholding rule is not larger than the oracle risk up to 
some logarithmic term which is the price to pay for not having extra information on the locations 
of the coefficients to keep. So the main question is: does / n . 7 satisfy a similar oracle inequality? In 
our framework, it is easy to see that the oracle estimate is / = ^Aer n Px<fx, where for any A £ T n , 

/3a = /3a1{^2>v Aj „} and we nave 

Mh-dx?) =mm(Pl,V x , n ). 

By keeping the coefficients f3\ larger than thresholds defined in (jl.4j) . our estimator has a risk that 
is not larger than the oracle risk, up to a logarithmic term, as stated by the following key result. 

Theorem 1. Let us consider a biorthogonal wavelet basis satisfying the properties described in 
Section \2.2l Let us fix two constants c > 1 and d G R, and let us define for any n, jo = jo(n) 
the integer such that 2- ?0 < n c (logn) c < 2 J0+ . If 7 > c, then /„ 7 satisfies the following oracle 
inequality: for n large enough 

E(||/„ )7 -/|||)<C 1 



mm(l3lVx, n logn) + ^ (3 
Aer„ A^r„ 



+ ^ (1.6) 
n 



where C\ is a positive constant depending only on 7, c and the functions that generate the biorthog- 
onal wavelet basis. C2 is also a positive constant depending on 7, c d , ||/||i and the functions that 
generate the basis. 



Adaptive thresholding estimation of a Poisson intensity 



5 



Note that Theorem [T] holds with c = 1 and 7 > 1. Following the oracle point of view of Donoho 
and Johnstone, Theorem [1] shows that our procedure is near optimal. The lack of optimality is due 
to the logarithmic factor. But this term is in some sense unavoidable, as shown later in Theorem 
[U Now, let us discuss the near optimality of our procedure from some other perspectives. 



1.2 Discussion on the assumptions 

Previously, we explained why it is crucial to provide theoretical results under very mild assumptions 
on /. Observe that Theorem [T] is established by only assuming that / belongs to Li(IR) (to ensure 
that iVffi < 00 almost surely) and / belongs to L2(R) (to obtain wavelet decomposition and the 
study of the performance of / n>7 under the L2T0SS). In particular, / can be unbounded and nothing 
is said about its support which can be unknown or even infinite. The goal of this section is to discuss 
this last point since, most of the time, estimation is performed by assuming that the intensity has 
a compact support known by the statistician, usually [0, 1]. Of course, most of the Poisson data 
are not generated by an intensity supported by [0, 1] and statisticians know this fact but they have 
in mind a simple preprocessing that can be described as follows. Let us assume that we know a 
constant M such that the support of / is contained in [0, M\. Then, observations are rescaled by 
dividing each of them by M and new observations (that all depend on M) belong to [0, 1]. An 
estimator adapted to signals supported by [0, 1] can be performed, which leads to a final estimator 
of / supported by [0, M] by applying the inverse rescaling. Note that such an estimator highly 
depends on M. 

Let us go further by describing the situations that may be encountered. If the observations are 
physical measures given by an instrument that has a limited capacity, then the practitioner usually 
knows M. In this case, if the observations are not concentrated close to but are spread on the 
whole interval [0, M] in a homogeneous way, then the previous rescaling method performs well. But 
if one does not have access to M then we are forced in the previous method to estimate it, usually 
by the largest observation. Then one is forced to face the problem that two different experiments 
will not lead to estimators with the same support or defined at the same scale and hence it will be 
hard to compare them. Note also that up to our knowledge, sharp asymptotic properties of such 
rescaling estimators depending on the largest observation have not been studied. In particular, this 
method does not seem to be robust if the observations are not compactly supported and if their 
distribution is heavy-tailed. This situation happens for instance in the financial and geological 
examples mentioned previously (see 29, H, 13]) but also in a wide variety of situations (see p3]). 



In these cases, if observations are rescaled by the largest one, then, methods described at the 
beginning of the paper provide a very rough estimate of / on small intervals close to 0. However, 
most of observations may be concentrated close to (for instance for geological data, see [22j |) and 
sharp local estimation at may be of interest. To overcome this problem, statisticians with the 
help of experts can truncate the data and estimate the intensity on a smaller interval [0, M cu t] 
corresponding to the interval of interest. Then, they face the problem that M cu t may be random, 
subjective, may change from a set of data to another one and may omit values with a potential 
interest in the future. 

So, even if partial solutions exist to overcome issues addressed by the support of /, they need a 
special preprocessing and are not completely justified from a theoretical point of view. We propose 
a procedure that ignores this preprocessing and which is adapted to non compactly supported 
Poisson intensities. Our procedure is simple (simpler than the preprocessing described previously) 
and we prove in the sequel that our method is adaptive minimax with respect to the support which 
can be bounded or not. 



6 



P. Reynaud-Bouret and V. Rivoirard 



1.3 Optimality of / n7 under the minimax approach 

To the best of our knowledge, minimax rates for Poisson intensity estimation have not been investi- 
gated when the intensity is not compactly supported. But let us mention results established in the 
following close set-up: the problem of estimating a non-compactly supported density based on the 
observations of a n-sample, which has been partly solved from the minimax point of view. First, 
let us cite [9|] where minimax results for a class of functions depending on a jauge are established 
or [2(J for Sobolev classes. In these papers, the loss function depends on the parameters of the 
functional class. Similarly, Donoho et dl. [18| proved the optimality of wavelet linear estimators 
on Besov spaces Bp q when the L p -risk is considered. First general results where the loss is inde- 
pendent of the functional class have been pointed out by Juditsky and Lambert-Lacroix [25[ who 
investigated minimax rates on the particular class of the Besov spaces B^ ^ for the L^-risk. When 
7r > 2 + 1/a, the minimax risk is of the same order up to a logarithmic term as in the equivalent 
estimation problem on [0, 1]. However, the behavior of the minimax risk changes dramatically when 
7r < 2 + 1/a, and in this case, it depends on n. Note that minimax rates for the whole class of 
Besov spaces Bp q (a > 0, 1 < p, q < oo) are not derived in [iH]. This is the goal of Section [3] under 
the L2 risk in the Poisson set-up. 

Under mild assumptions on 7, a, p, c and c', we prove that the maximal risk of our procedure 
over balls of B^ q n is smaller than 

2a 



lognV J TT§a ifl<P<2 



with 



n J l+Q-i 



if 2 < p < +00. 



We mention that actually for p > 2, it is not necessary to assume that the functions belong to 
Lqq to derive the rate. In addition, we derive the lower bound of the minimax risk for Bp n 
that coincides with the previous upper bound up to the logarithmic term. Let us discuss these 
results. We note an elbow phenomenon for the rate exponent s. When p < 2, s corresponds to 
the minimax rate exponent for estimating a compactly supported intensity of a Poisson process. 
Roughly speaking, it means that it is not harder to estimate non-compactly supported functions 
than compactly supported functions from the minimax point of view. When p > 2, the rate 
exponent, which depends on p, deteriorates when p increases, which means that the support plays 
a harmful role in this case. An interpretation of this fact and a long discussion of the minimax 
results are proposed in Section 13.21 Let us just mention that these results are established by using 
the maxiset approach presented in Section 13.11 We conclude this section by emphasizing that f n ^ 
is rate-optimal, up to the logarithmic term, without knowing the regularity and the support of the 
underlying signal to be estimated. 



1.4 Overview of the paper 

Section [2] recalls properties of the Poisson process and introduces the biorthogonal wavelet bases 
used in this paper. Section [3] discusses the properties of our procedure in the minimax and maxiset 
approaches. Section 0] provides a very general oracle type inequality based on the model selection 
approach from which Theorem [1] is derived and contains the proofs of the other results. 

2 Main Tools 

2.1 Some probabilistic properties of the Poisson process 

Let us first recall some basic facts about Poisson processes. 
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Definition 1. Let (X, X) be a measurable space. Let N be a random countable subset of X. N is 
said to be a Poisson process on (X, X) if 

1. for any A £ X , the number of points of N lying in A is a random variable, denoted Na, 
which obeys a Poisson distribution with parameter ^{A), where /x is a measure on X. 

2. for any finite family of disjoint sets A\,...,A n of X, Na 1 , Na„ are independent random 
variables. 



The measure /x, called the mean measure of N, has no atom (see [271] ). In this paper, we assume 
that X = M, < oo and u, is absolutely continuous with respect with the Lebesgue measure. 

As explained in Introduction, without loss of generality, we introduce a parameter n and we define 
the intensity of the process as / = . We can also mention that a Poisson process N is infinitely 
divisible, which means that it can be written as follows: for any positive integer k: 



dN = Y^ dNi (2.1) 



i=l 



where the iVj's are mutually independent Poisson processes on M with mean measure [ijk. The 
following proposition (sometimes attributed to Campbell (see (27|)) is fundamental and will be 
used along this paper. 

Proposition 1. For any measurable function g and any z£l, such that J e zgx dfi x < oo one has, 



E 

So, 



exp (z jf g{x)dN x \ = exp (J (>M _ l) d A 



E / g{x)dN x = / g(x)dfi x , Var / g{x)dN x = / g 2 {x)d^ x . 

\JR J JR \JR J JR 

If g is bounded, this implies the following exponential inequality. For any u > 0, 



P ^g(x)(dN x - dti x ) > ^2ujjH^ x + ^NIoouJ < exp(-n). (2.2) 
2.2 Biorthogonal wavelet bases and Besov spaces 

In this paper, the intensity / to be estimated is assumed to belong to Li HL2. In this case, / can be 
decomposed on the Haar wavelet basis and this property is used throughout this paper. However, 
the Haar basis suffers from lack of regularity. To remedy this problem, in particular for deriving 
minimax properties of f nri on Besov spaces, we consider a particular class of biorthogonal wavelet 
bases that are described now. For this purpose, let us set 

4> = i[o,i]- 

For any r > 0, there exist three functions ift, (ft and ip with the following properties: 

1. tfi and ip are compactly supported, 

2. <f> and tp belong to C r+1 , where C r+1 denotes the Holder space of order r + 1, 
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3. ip is compactly supported and is a piecewise constant function, 

4. i/j is orthogonal to polynomials of degree no larger than r, 

5. {((f>k,ipj,k)j>o,kez, (0fc,V'i,A:)j>o,fcGz} is a biorthogonal family: for any j,f > 0, for any k, k' G 



i>j,k{x)(j)k'{x)dx = I 4> k (x)ipj\k'(x)dx = 0, 



/ 



<t>k{x)4>k>{x)dx = l k =k>, / ipj,k{x)ipj',k>(x)dx = l j=j , ik=k ,, 

JR 

where for any i£R and for any (j,k) G Z 2 , 

= <£(ic - k), ij)j,k{x) = 22ip(2 J x - k) 

and 

4> k (x) = 4>(x - k), ^j ; k(x) = 22tjj(2 j x - k). 

This implies the wavelet decomposition of /. Such biorthogonal wavelet bases have been built 
by Cohen et al. [nj as a special case of spline systems (see also the elegant equivalent construction 
of Donoho [3] from boxcar functions). The Haar basis can be viewed as a particular biorthogonal 
wavelet basis, by setting (j) = (p and ip = ip = lr ii — lu ^ , with r = even if Property 2 is not 
satisfied with such a choice. The Haar basis is an orthonormal basis, which is not true for general 
biorthogonal wavelet bases. However, we have the frame property: if we denote 

there exist two constants c\{^>) and C2(^) only depending on <P such that 

<*(#) (e^+EE&i < 11/112 < <*(*) (E°2+EE# 

yfcez j>o fcez / \k& j>o k& 

For instance, when the Haar basis is considered, c\{&) = C2(<P) = 1. In particular, we have 

- < ||/n, 7 - /111 < c 2 (<Z>)|/3 - pf l2 . (2.3) 

An important feature of such bases is the following: there exists a constant //w, > such that 

inf U(x)| >1, inf |^(x)| >^, (2.4) 
xe[o,i] xesupp(V') 

where supp(^) = {i £ I : ^(x) 7^ 0}. This property is used throughout the paper. 

Now, let us give some properties of Besov spaces that are extensively used in the next section. 
We recall that IBesov spaces, denoted in the sequel, are defined by using modulus of continuity 



(see [14| and [21J). We just recall the sequential characterization of Besov spaces by using the 
biorthogonal wavelet basis (for further details, see fl3|). 

Let 1 < p, q < 00 and < a < r + 1, the Bp ? -norm of / is equivalent to the norm 



if q < 00, 



IK)*lk + fe i >o2 J9(a+ ^" ) ll(^).lltl 1/9 

\(a k )k\U p +sup j > 2 Jl ° 2 p>\\(f3 jtk ) k \\ ep ifg = oo. 
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We use this norm to define the radius of Besov balls. For any R > 0, if < a' < a < r + 1, 
1 < p < p' < oo and 1 < q < q' < oo, we obviously have 

Bl q {R) c B« ql (R), Bp q (R) c Bg tQ (R). 

Moreover 

B$ )9 (R) C B# >q (R) if a - ~ > a' - i (2.5) 

The class of Besov spaces /3p iOC provides a useful tool to classify wavelet decomposed signals with 
respect to their regularity and sparsity properties (see [Hi]). Roughly speaking, regularity increases 
when a increases whereas sparsity increases when p decreases (see Section 13 . 2[) . 

3 Minimax results via the maxiset study 

We present in this section the minimax results stated in Introduction. These minimax results 
are deduced from maxiset results that are first presented. Subsection 13.11 can be omitted on first 
reading. 

3.1 The maxiset approach 

First, let us describe the maxiset approach which is classical in approximation theory and has been 
initiated in statistics by Kerkyacharian and Picard 0- For this purpose, let us assume that we are 
given /* an estimation procedure. The maxiset study of /* consists in deciding the accuracy of /* 
by fixing a prescribed rate p* and in pointing out all the functions / such that / can be estimated 
by the procedure /* at the target rate p* . The maxiset of the procedure /* for this rate p* is the 
set of all these functions. More precisely, we restrict our study to the signals belonging to Li n L2 
and we set: 

Definition 2. Let p* = {Pn)n be a decreasing sequence of positive real numbers and let f* = (fX)n 
be an estimation procedure. The maxiset of f* associated with the rate p* and the L2-/0SS is 

MS(f*,p*) = j/ G Li HL 2 : sup [(p* )- 2 IE||/* - /|||] < +00 j , 

the ball of radius R > of the maxiset is defined by 

MS(f*, P *)(R) = h g u n l 2 : sup [(p;r 2 E|/: - /ii] <r 2 Y 

So, the outcome of the maxiset approach is a functional space, which can be viewed as an 
inversion of the minimax theory where an a priori functional assumption is needed. Obviously, the 
larger the maxiset, the better the procedure. Maxiset results have been established and extensively 
discussed in different settings for many classes of estimators and for various rates of convergence. 
Let us cite for instance [2^], 0] and [a] for respectively thresholding rules, Bayes procedures and 
kernel estimators. More interestingly in our framework, [3] derived maxisets for thresholding rules 
with data-driven thresholds for density estimation. 

The goal of this section is to investigate maxisets for / 7 = (f n ,^)n and we only focus on rates 
of the form p s = (p n ,s)n, where < s < | and for any re, 
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So, in the sequel, we investigate for any radius R > 0: 



MS(/ 7j/)s p)= /£LinL 2 : sup 



log n 



n 



-2s 



< R 2 



and to avoid tedious technical aspects related to radius of balls, we use the following notation. If 
!F S is a given space 

MS(f 7 , Ps ) :=: T s 
means in the sequel that for any R > 0, there exists R' > such that 

MS(f~ pPs )(R) nLi(fl) nh 2 (R) c ^" s (i?')nLi(i?)nL 2 (i?) 

and for any R' > 0, there exists R > such that 

F a {E!) n i^i(R') n L 2 ( J R') c MS(f 7 , p 8 )(R) n Li(i?') n h 2 (R'). 

To characterize maxisets of / 7 , we set for any A G A, <r| = J Lp\(x)f{x)dx and we introduce the 
following spaces. 

Definition 3. We define for all R> and /or aZZ < s < 5, 



l AeA 



A r A : supt 4s V /3a1|/3 a |<^ < 00 L 
t>0 AeA J 



i/ie ball of radius R associated with W s is: 



W s (R) = \f = J2 [3xvx ■ supt" 4s V fill 



|/0a|<o-a* 



< R 



2-As 



and for any sequence of spaces V = (T n ) n included in A, 

log n 



B 8 2,T = {f = Y] 0\<fX ■ SUp 

AeA 



n 



A^r n 



and 



5|, r (i2) = { 



AeA 



A^A : sup 

n 



log n 



n 



-2s 



Agr n 



< 00 



> . 



These spaces just depend on the coefficients of the biorthogonal wavelet expansion. In 14j], a 
justification of the form of the radius of W s and further details are provided. These spaces can be 
viewed as weak versions of classical Besov spaces, hence they are denoted in the sequel weak Besov 
spaces. Note that if for all n, 



r„ = {A = (J, k) e A : j< jo} 



with 



2 3o < 



n 



log n 



\ <2*> +1 , c>0 



then, £?| r is the classical Besov space B 2 ^ if the reconstruction wavelets are regular enough. We 
have the following result. 
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Theorem 2. Let us fix two constants c > 1 and d £ M, and let us define for any n, jo = jo(n) the 
integer such that 2- J0 < n c (logn) c < 2- J0+ . Let 7 > c. Then, the procedure defined in M.5\) with 
the sequence T = (r n )„ such that 

r„ = {A=( Jl fc)EA: j<j } 

achieves the following maxiset performance: for all < s < \, 

MS(f 7 ,p s ) :=: S| >r nW s . 

In particular, if d = — c and < sc _1 < r + 1, where r is the parameter of the biorthogonal basis 
introduced in Section [27B. 

MS(f y , Ps ) :=:B s 2 % 1 nW s . 

The maxiset of / 7 is characterized by two spaces: a weak Besov space that is directly connected 
to the thresholding nature of / 7 and the space i?| r that handles the coefficients that are not 
estimated, which corresponds to the indices j > jo- This maxiset result is similar to the result 
obtained by Autin [2j] in the density estimation setting but our assumptions are less restrictive (see 
Theorem 5.1 of 0]). 

Now, let us point out a family of examples of functions that illustrates the previous result. For 
this purpose, we only consider the Haar basis that allows to have simple formula for the wavelet 
coefficients. Let us consider for any < j3 < ^, fp such that, for any i£l, 

The following result points out that if s is small enough, fp belongs to MS(/ 7 , p s ) (so fp can be 
estimated at the rate p s ), and in addition fp L^,. This result illustrates the fact that the classical 
assumption ||/||oo < °o is not necessary to estimate / by our procedure. 

Proposition 2. We consider the Haar basis and we set d = —c. For < s < 1/6, under the 
assumptions of Theorem^ if 

0</3< i(l-6s), 

then for c large enough, 

fpeMS(ff, Ps ). 

Let us end this section by explaining the links between maxiset and minimax theories. For this 
purpose, let T be a functional space and TiJV) be the ball of radius R associated with T . T{K) 
is assumed to be included in a ball of Lx H L2 . The procedure / 7 is said to achieve the rate p s on 
F(R) if 



( Pn , s )- 2 sup E|/ n , 7 - /II 
feT(R) 



sup 

n 

So, obviously, / 7 achieves the rate p s on J~{R) if and only if there exists R' > such that 



< 00. 



T{R) C MS{f 7 , p s ){R') nhx(R') nh 2 (R'). 

Using previous results, if d = — c and if properties of regularity and vanishing moments are satisfied 
by the wavelet basis, this is satisfied if and only if there exists R" > such that 

F{R) c Bl^ s (R") n W S (R") n U(R") n h 2 (R"). (3.1) 

This simple observation will be used to prove some minimax statements of the next section. 
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3.2 Minimax results 

To the best of our knowledge, the minimax rate is unknown for Bp „ when p < oo. Let us investigate 
this problem by pointing out the minimax properties of / 7 on B^ q . For this purpose, we consider 
the procedure / 7 = (f n ,-y)n defined with 

r n = {A = (j, k)eA: j < jo} 

and jo = jo (n) is the integer such that 

2 jo < n c (logn)- c < 2 jo+1 . 

The real number c is chosen later. We also set for any R > 0, 

A,2,ocOR) = {/ : ll/lli < R, Wfh < R, ll/lloo < R} ■ 

In the sequel, minimax results depend on the parameter r of the biorthogonal basis introduced in 
Section [2.21 to measure the regularity of the reconstruction wavelets (4>,ip). We first consider the 
case p < 2. 

Theorem 3. Let R, R' > 0, 1 < p < 2, 1 < q < oo and a £ R such that max (o, | - |^ < a < r+1. 
Let c > 1 large enough such that 

1 1 



// 7 > c, then for any n, 



a 1 -, 7 > 

c(l + 2a)J ~ p 2 



2a 



sup E(\\f nyl -f\\l)<C( 1 ,c,R,R',a,p,<P)( l ^y a+1 (3.2) 

where C{^,c,R,R' ,a,p,<I>) depends on R' , 7, c, on the parameters of the Besov ball and on <P. 

When p < 2, the rate of the risk of f nn corresponds to the minimax rate (up to the logarithmic 
term) for estimation of a compactly supported intensity of a Poisson process (see [30]), or for 
estimation of a compactly supported density (see [3]). Roughly speaking, it means that it is 
not harder to estimate non-compactly supported functions than compactly supported functions 
from the minimax point of view. In addition, the procedure / 7 achieves this classical rate up to 
a logarithmic term. When p > 2 these conclusions do not remain true and we have the following 
result. 

Theorem 4. Let R, R' > 0, 2 < p < 00, 1 < q < 00 and a G M. such that < a < r + 1. Let c > 1. 
// 7 > c, then for any n, 

sup E(|/ n>7 - /||i) < C( 7 , c, R, R', a, p, <P) (^) (3.3) 

feB° q (R)tfLi(R')niL 2 (R') \ n J 



where C{^,c,R,R' ,a,p,<I>) depends on R' , 7, c, on the parameters of the Besov ball and on <P. 



Adaptive thresholding estimation of a Poisson intensity 



13 



For p > 2, we can note that it is not necessary to assume that signals to be estimated belong 
to Lqq to derive rates of convergence for the risk. Note that when p = oo, the risk is bounded 

by (^fp) 1+a up to a constant. In the density estimation setting, this rate was also derived by 
[25! for their thresholding procedure whose risk was studied on B^ ^(R)- Now, combining upper 
bounds (|3.2p and (j3.3l) . for any R, R' > 0, 1 < p < 00, 1 < q < 00 and a S K such that 



max ( 



1 1 



< a < r + 1, we have: 



sup E( 

/eB« ? («)nri, 2]00 (fl') 



f\\ 2 2 )<C( 7 ,c,R,R',a, P ,0) 



logn^ Q +i+(i-i) j 
n 



under assumptions of Theorem [3l The following result derives lower bounds of the minimax risk 
and states that / nj7 is rate-optimal up to a logarithmic term. 



Theorem 5. Let R, R' > 0, 1 < p < 00, 1 < q < 00 and a £ K such that max ^0, ^ — \ 
r + 1 . ITien, 



< a < 



lim inf n" 2 

n— *+oo 



inf sup E(|/ n — /Hi) > C(j,c, R, R' ,a,p,@) 

f fet3« q {R)nc K2 MR') 



where ,c, R, R' ,a,p,<&) depends on R' , 7, c, on the parameters of the Besov ball and on 
Furthermore, let p* > 1 and a* > such that 



a* 1 



1 



> 



1 



c(l + 2a*)y p* 
Then, / 7 is adaptive minimax up to a logarithmic term on 

{B* q {R) n £1,2,00(^0 : a* <a<r + l, p* <p< +00, 1 < q < 00} . 
Table [T] gathers minimax rates (up to a logarithmic term) obtained for each situation. 



(3.4) 





1 < P < 2 


2 < p < 00 


compact support 


la 
U 2a+l 


2a 

n 2a+l 


non compact support 


2a 

n 2a + 1 


a 

a+l-i 
71 P 



Table 1: Minimax rates on Bp q n £i.2,oo(up to a logarithmic term) with 1 < p,q < 00, a > 



max f 0, - — I ) under the 



In-loSS. 



Our results show the influence of the support on minimax rates. Note that when restricting 
on compactly supported signals, when p > 2, B^ ^R) C £>2 j00 (i?) for R large enough and in this 
case, the rate does not depend on p. It is not the case when non-compactly supported signals are 
considered. Actually, we note an elbow phenomenon at p = 2 and the rate deteriorates when p 
increases. Let us give an interpretation of this observation. Johnstone (1994) showed that when 
p < 2, Besov spaces Bp q model sparse signals where at each level, a very few number of the wavelet 
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coefficients are non-negligible. But these coefficients can be very large. When p > 2, £>p j(? -spaces 
typically model dense signals where the wavelet coefficients are not large but most of them can be 
non-negligible. This explains why the size of the support plays a role for minimax rates as soon 
as p > 2: when the support is larger, the number of wavelet coefficients to be estimated increases 
dramatically. 

Finally, we note that our procedure achieves the minimax rate, up to a logarithmic term. This 
logarithmic term is the price we pay for considering thresholding rules. In addition, / 7 is near rate- 
optimal without knowing the regularity and the support of the underlying signal to be estimated. 

We end this section by proving that our procedure is adaptive minimax (with the exact exponent 
of the logarithmic factor) over weak Besov spaces introduced in Section [3.11 For this purpose, we 
consider signals decomposed on the Haar basis, and we establish the following lower bound with 
respect to W s . We recall that for any < s < ^, 




Theorem 6. We consider the Haar basis (the spaces W s and -Bf r introduced in Section \3.1\ are 
viewed as sequence spaces). Let 

T n = {\ = (j,k)eA: j<j } 
with jo = jo(n) the integer such that 

2 jo < n(logn)" 1 < 2 jo+1 . 
For < s < \ and R, R', R" > such that R" > 1 and R' > R 1 ' 28 > 1, we have 

liminf p~l inf sup E(||/ n - /|||) > C{s)R 2 ~ is , 

where C(s) depends only on s and <P. 

Using Theorem [2] that provides an upper bound for the risk of our procedure, we immediately 
deduce the following result. 

Corollary 1. The procedure defined with 

T n = {X = (j,k)eA: j<j } 

with jo = jo(n) the integer such that 2 J0 < n(logn) -1 < 2 J0+1 and with 7 > 1 is minimax on 
W S (R) n -B| r (i?') n £-i,2,oo(R") o,nd is adaptive minimax on 

\w s {R) n B S 2)T {R') n A.2,oo(fl") = < s < i 1 < R", 1<R<R'\. 



4 Proofs via the model selection approach 

In this section, we use the model selection approach to provide a very general result with respect 
to the estimation of a countable family of coefficients. This result is stated in Theorem [7] and is 
valid for various settings. Applied to the Poisson setting, it allows to establish Theorem [TJ 
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4.1 Connections between thresholding and model selection 

To describe the model selection approach, let us introduce the following empirical contrast: for any 
family a = {a A , A G A}, we set 

C n (a) = -2^a A /3 A + ^a|, 
AeA AeA 

which is an unbiased estimator of C(a) = — — ||/3||f 2 - Note that the minimum of C is achieved 
for a = (3. Model selection proceeds in two steps: first we consider some family of models to C A 
and we find (3(m) the mimimum of C n on each model m. Then, we use the data to select a value m 
of to and we take (5{rh) as the final estimator. The first step is immediate in our setting: for any 
to C A, 

(3( m ) = (^ A l {A6m }) AeA 

and C n {(3{m)) = — Ylxemftx- Now, the question is : how to choose to? One could be tempted to 
choose m as large as possible but this choice would lead to estimates with infinite variance. For 
this reason, Birge and Massart 0] proposed to introduce a penalty term associated to each model 
to, denoted pen(TO), and to choose rh by minimizing 

Crit(m) = — fix + pen(m) 

Aem 

over a large class of possible models to. For instance, we can fix T a deterministic subset of A 
and consider all the subsets of T. The role of the function m — > pen(m) is to govern the classical 
bias-variance tradeoff. Now, if we consider a family of thresholds (?7a)asA and if we set for any 

to c r 

pen(m) = ^ rjl, 

Xdm 

then the model selection procedure is equivalent to the thresholding rule associated with the family 
(f?A)AeA: 

to = {A e r : |&| > m } 

and 

(3{rh) = (/5Al { | / 3 A |>^ A} l{Aer})AeA = P- 

Let us note that our method has to be performed for signals with infinite support. So, T may be 
infinite, which is not usual in the literature. The following theorem is self-contained; we do not use 
the Poisson setting and we do not make any assumption on the distribution of (3\ or on the form 
of the threshold r/\. So, Theorem [7] can be used for other settings and this is the main reason for 
the following very abstract formulation. 

Theorem 7. To estimate a countable family (3 = {Px)\eA> such that \\(3\\e 2 < oo, we assume that 
a family of coefficient estimators (Px)xer> where T is a known deterministic subset of A, and a 
family of possibly random thresholds (r/ A )Aer o- re available and we consider the thresholding rule 
(3 = (/?Al| ( g A |>^ A 1 Aer)AeA- Let £ > be fixed. Assume that there exist a deterministic family 
(F A ) Ae r and three constants n £ [0, 1[ ; u> £ [0, 1] and fi > (that may depend on e but not on X) 
with the following properties. 



(Al) For all X in Y, 
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(A2) There exist 1 < p, q < oo with | + | = 1 and a constant R > such that for all A in V, 



E(|& -/?a| 2p )) F < i?max(F A) F A M). 



(^43,) There exists a constant 6 such that for all X inT such that F\ < 9e 

H\k - P\\ > ^rjx, \p x \ > fix) < Fxfi. 

Then the estimator (3 satisfies 

- /3||1 2 < E M I £ ^ + £ (4a - /5a) 2 + £ vl\ + ^ £ ^ 

I A0m Asm Aem I Aer 



with 



LD = ^((l + 9- l l q ) uj l ' q + (1 + 1 /9) e i/s /i i/9 



K 2 



Observe that this result makes sense only when X^Aer -^A < 00 an d in this case, if LD (which 
stands for large deviation inequalities) is small enough, the main term of the right hand side is 
given by the first term. 

Now, let us briefly comment the assumptions of this theorem. The concentration inequality 
of Assumption (Al) controls the deviation of \j3x — /3\\ with respect to 0. The family (Fx)xer 
is introduced for Assumptions (A2) and (A3). Assumption (A2) provides upper bounds for the 
moments of fix an d looks like a Rosenthal inequality if Fx can be related to the variance of (3x- 
Actually, compactly supported signals can be well estimated by thresholding if sharp concentration 
and Rosenthal inequalities are satisfied (see Theorem 3 of [3] and Theorem 3.1. of [26]). In our set- 
up where the support of / can be infinite, these basic tools are not sufficient and Assumption (A3) 
is introduced to ensure that with high probability, when F\ is small, then either f3\ is estimated 
by 0, or \0x ~ P\\ is small. Remark [T] in Section [4.21 provides additional technical reasons for the 
introduction of Assumption (A3) when the support of the signal is infinite. Finally, the condition 
^ Agr Fx < oo shows that the variations of (f3x)\er around ((3x)x^r, as pointed out by Assumptions 
(A2) and (A3), have to be controlled in a global way. 

This theorem applied in the Poisson set-up with r = T n and r/x = rjx,j implies Theorem [TJ In 
particular the family (Fx)x is given by Fx = J SU pp( ¥ , A ) f(x)dx, which is related to the variance of 

Px (see g3D). 

Using (12. 3p , without loss of generality, Theorems [H EJ EJ S] and [5] are established by using the 
^2-norm of coefficients instead of the functional L2-I0SS. In the following proofs, the values of the 
constants C\, C2, K\, K2, 0, ... may change from one proof to another one. Finally, recall that we 
have set for any A S A, 

<>\= l P\{x)f{x)dx. 



4.2 Proof of Theorem \7\ 

We use the model selection approach. By definition of fh one has for any m C T, 

C n (P) + pen(m) < C n ((3(m)) + pen(m). 
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For any family a = (c^a)agA) we set 

u{a) = ^ a\((3\ - /3a). 
AeA 

Then, using (|4.ip . 

C n (a) = \\P-a\\l- -2u(a). 

So, 

11/3 - 0\\l < \$(m) - + 2v{0 - 0{m)) + pen(m) - pen(m) 

< \\0(m) - /3||! + 2i/(/3 - 0{m)) - 2v(0(m) - (3{m)) + pen(m) - pen(m), 

where 0(m) = E(/3(m)) is the projection of on the space of the vectors a = («a)a£A such that 
a a = when A ^ m for the ^-norm. But, 

\\Km) ~ 0\\l = \\0(m) - 0{m)\l + \0{m) - 0\\ = u(0(m) - 0{m)) + \\0{m) - 0\\ 

and 

2u(0-0(m)) < 2\\0- 0(m)\\ h x(mUm) 

< 2\\0 - 0U 2 x(m U rh) + 2\\0 - 0{m)\\ l2 x{m U rh) 

Ik 2 Ik 2 1 
^ 7X^2 I/ 3 - ^lll + i ~2\\0( m ) ~ 0\\l + -jxV U rh), 

1 T ft 1 ft ft 

where we have set for any m C T, 



X(m) = ||/3(m) - /3(m)|| £2 = /^(/3 a -/?a) 2 = yj v0(m) - 0(m)) 

V Asm 

and we have used twice the inequality 2a6 < pa 2 + p~ x b 2 with p = 2k 2 (1 + /t 2 ) -1 and p = 
2k 2 (1 - k 2 )- 1 . Finally, 

TJTz\P ~ 0\\h < -\\0(m) - P{m)\\l + l±^\p( m ) - (3\\ 2 + \ X 2 {m U m) + pen(m) - pen(m) 

1 + /t ^ 1 — K z K z 

< ^-^II^M - /3|| 2 2 + (i - l) |0(m) - /3(m)|| 2 2 + pen(m) + A 

1 ft \ ft / 



where 

^ = ^{rn) - pen(m) = ^ f -j(0 A - /?a) 2 - vl) 1|& , u - 



Aer 

Now, we introduce 



Aer 

and 



^^EJE^^-^a) 2 !,^,^)!. 
Aer v 7 



Aer 
Therefore, 

E[-4] <A Y +A 2 



18 



P. Reynaud-Bouret and V. Rivoirard 



By using the Holder inequality, 

Al ^ ^^(e(|/3 a -/? a | 2 h)"(p(|/3a-/?a|>^ a ))"i Fa >, £ 

K Aer 
i 

< — — ^max(F A ,F A p ^)l FA 



< 



< 



i 



Aer 

\Aer Aer v 7 

i 



K 2 



1 + 6 



Aer 



and 



^ 2 ^ ^^(e(|/3a-/?a| 2p ))' (p(|/3a-/?a| >^ A ,|/3 A | >r /A ))'l FA<fe 
K Aer 

K Aer 



vAer v rt/ Aer 



K 

Aer 



So, 



E(.A) <LDY,Fx, 
Aer 

which proves Theorem [71 

Remark 1. When compactly supported signals are considered, it is natural to take T satisfying 
card(T) < oo and in this case, the upper bound o/E(»4.) takes the simpler form: 

K Aer 



1 

k 2 ' Aer 



< ^card(r)max(E(|/3 A -/3 A | 2p ) P u;i 



Even under a rough control o/max Ag p E(|/3 A — (3\\ 2p ), the term E(«4) is negligible with respect to the 
main term as soon as w is small enough, which occurs if the threshold is large enough. In particular, 
when restricting our attention to compactly supported signals, Assumption (A3) is useless. 

4.3 Proof of Theorem Q] 

To prove Theorem [TJ we use Theorem [7J with /3 A defined in (|1.3f) , rj\ = t/ Aj7 defined in (|1.4|) and 

r = T„ = {A = (j, k) G A : -1 < j < j } with 2 jo < n c (logn) c ' < 2 jo+1 . 
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We set 

/supp(^ A ) 



F x = I f(x)dx, 



so we have: 

V ^ ^ ' ' ' ' ~)«V||/||i, 



E F ^= E E/ Hx)dx< [ f( x )dx Ew PP fe,)<(jo+2) 

Aer -i<j<io k ^sesuppfe.fc) ./ -i<j<j fe 

(4.2) 

where is a finite constant depending only on the compactly supported functions <ft and ip. 
Finally, Ylxer^ is bounded by log(n) up to a constant that only depends on ||/||i, c, d and the 
functions 4> an d ip- Now, we give a fundamental lemma to derive Assumption (Al) of Theorem [7J 



Lemma 1. For any u > 



|/3a-/3a|> v / 2^V + ^^) <2e" u . (4.3) 



Moreover, for any u > 

F (V A ,n > ft,n(«)) < 

where 

I ii V2 II II 2 

Vx.n{u) = Va,„ + J 2V x . n ^^u + 3 l ^^u. 

V n z n z 

Proof. Equation (|4.3p comes easily from (|2.2p applied with g = <~p\jn. The same inequality applied 
with g = —Lp\jn 2 gives: 



Vx, n > Vx,n + J 2u J R ^r-nf(x)dx + ^jf <i J < e~ u . 



We observe that 



n 4 n z 



So, if we set a = ? J yA j°° , then 

nVx, n ~ s/Wx~^ ~ a/3 > Vx,n) < e~ u . 

We obtain 

nVVx^>V-HVx,n)) <e~ u 
where V~ 1 (V\ !n ) is the positive solution of 

(V-'iVx^n)) 2 ~ V^P-^Vx,^ ~ (a/3 + Vx,n) = 0. 
To conclude, it remains to observe that 



V x , n (u) > {V-\V Xin )Y = JV x , n + 5a/6 + . 
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Let k < 1. Combining these inequalities with V\^ n = V\^ n ("f\ogn) yields 

3n 



F(|/9a - /9a I > «»7A, 7 ) < F I/5a - /9a I > V 2K2 7log^^A,n + 



< 



\h - fi x \ > ^2^\ognV x>n + K7l ° g 3 n ^ Aio ° , V\ n > Vx,n 
+P (\Px - fix\ > j2^lognV x , n + K7l0g o re| ^ i0O ^A,n < Vx,n 



3n 



< P(^A,n > V X;n ) + P [\Px ~ Px\ > j2K 2 "flognV Xtn + 



3n 



< n" 7 + 2n~ K 7 

-2„ 



< 3n 



-re 7 



So, for any value of k 6 [0, 1[, Assumption (Al) is true with rjx = ?7a, 7 if we take u = 3n K 7 . To 
verify the Rosenthal type inequality (A2) of Theorem [71 we prove the following lemma. 

Lemma 2. For any p > 2, there exists an absolute constant C such that 

2p-2 \ 



n\px-Px\ 2p )<c p p 2p \v x ° n + 



n 



Proof. We apply (|2.1|) . Hence, 



0X-Px = Y,f — ( dN * - nk- l f(x)dx) = Yi 
i=l •* U i=l 



where for any i, 



Y 



¥x{x) 



n 



(dN l x - nk- 1 f{x)dx) . 



So the YiS are i.i.d. centered variables, each of them has a moment of order 2p. For any i, we 
apply the Rosenthal inequality (see Theorem 2.5 of [23J) to the positive and negative parts of Yi. 
This easily implies that 



E 



8=1 



2p\ 



< I — — max 

M(2p)J 



2p 



i=l 



i=l 



It remains to bound the upper limit of E(YJ i=1 \Yi\ e ) for all I G {2p, 2} > 2 when k —>■ oo. Let us 
introduce 

n fc = {v*€{i,...,*!},jvi<i}. 

Then, it is easy to see that P(ft£) < fc _:L (n||/||i) 2 (see e.g., flH]) below). 



On O fe , = O fc (A^) if f ^dN* x = and |y<| 



+ O k k 



,-1 



n 



if 
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j ^aW ^jy-r _ ¥>^E1 where T is the point of the process N l . Consequently, 



1=1 



\TeN 



\MT)\ 



n 



\<Px(T)\ 



n 



+ kO k (k~ 



\ 



E 



vi=l 



But we have 



vi=l 



II || 



(iV^ + AT 1 / |99 A (x)|/(x)dx 



< 2" 



II PA I 



n 



Ni + k(k- L \<p x (x)\f(x)dx 



(4.4) 



So, when fc — > +oo, the last term in (|4.4|) converges to since a Poisson variable has moments of 
every order and 



lim sup E ^ E 



i=l 



II PA || 



V X ,n, 



which concludes the proof. 
Now, 



n 



V\, n = - j tp\{x)f{x)dx < 



\W\\lo F X 



n 



(4.5) 



and Assumption (A2) is satisfied with e = ^ and 



R 



2Cp 2 2^ ma x(||^||^;||^||^; 



n 



since Hv^A ||§o < 2 Jo ma X (||0||^;||^||^) and 



n\Px-Px\ 2p )Y < c P 



2 flVxlloFx 



n 



Cp 2 ||^A 



2 

loo 



n 



Finally, Assumption (A3) comes from the following lemma. 
Lemma 3. We set 



/ dN and C = (y/E + 1/3)7 > V6 + 1/3. 

Jsupp(<p x ) 



There exists an absolute constant < 9' < 1 suc/i i/iai if nF\ < 0'C'logn and (1 — 9')(y/6 + 
l/3)log n > 2 i/ien, 

¥(N X - nF x > (1 - 6>')C"logn) < F x n^ . 



Remark 2. VFe can take 9' = 0.01 and in this case, the result is true as soon as n > 3 
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Proof. One takes 9' € [0, 1] (for instance 9' = 0.01) such that 
We use Equation (5.2) of [30] to obtain 

/ (Yi _ 9')C'losn) 2 \ 3(i-9') 2 r , 

- ^ >- (1 - » ,C1 ° g " ) £ » P (- ^ + (l-W3) ) £ " _2 " +1> • 

If nF\ > nT 7-1 , since j^fl'+i) C — ^7 + 2, the result is true. If nF\ < n _7_1 , 

¥(N x -nF x > {l-e')C'\ogn) < F{N X > (l-0')C"logn) < f>(N x > 2) < E e"" FA < (nF A ) 

fc>2 



nP A ) A 

-e " \ \u 

(4.6) 



and the result is true. ■ 
Now, observe that if \(3\\ > rj\ n then 

N x > C'logn. 

Indeed, \f3 x \ > rj Xn implies 

C'logn., - II^aIoo^a 

^A oo < PA < • 

n n 
So if n satisfies (1 — 0')(\/6 + l/3)logra > 2, we set 9 = 9'C'log (n) and /x = ra~ 7 . In this case, 
Assumption (A3) is fulfilled since if nF x < 9'C'log n 

n\Px - 0x\ > K V\> \P\\ > rfx) < n^x - nF x > (1 - 9')C'\ogn) < F x n~\ 
Finally, if n satisfies (1 — 0')(\/6 + l/3)logn > 2, Theorem [7] applies: 

- < M I £ + £ E& - p,) 2 + E l + ^ E^ 

^ AgYra Asm Asm J Aer 

(4-7) 

In addition, there exists a constant K\ depending on p, 7, c, c', ||/||i and on <3? such that 



LD^F X < ^ 1 (log(n)) c ' +1 n c ~^" 1 . (4.8) 
Aer 

2 

Since 7 > c, one takes At < 1 and q > 1 such that c < and as required by Theorem [H the last 
term satisfies 

ldJ2f x <^, 

Aer 

where K2 is a constant. Before evaluating the first term, let us state the following lemma. 
Lemma 4. We set 

= max{ sup 100*01) SU P IVK^)!} 

x£SUpp(<f>) x£SUpp(ip) 

and 

I v = min{ inf |<K X )I) hrf ^(x)!}. 
xesupp(<f>) xesupp(ip) 

r-rr-n S 2 

Using {2.1$ , we define = For all A G A, we have the following result. 

V 
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- If F X < then 01 < 8^MM 



A - Ky ip u \ n 



log M <- „ / log (") 



Proof. We note A = (j, k) and assume that j > (arguments are similar for j = —1). 

x log (n) 
"V n ' 



If F A < e„^^, we have 



\Px\ < s^f x < s^vfWvJ 1 ^ 1 < s^V%*J^ < %^ / log(n) 



n 



n 



since o\ > 1*2° F\. For the second point, observe that 



n 



'log(n) j /—— log fn) , log (n) i_ log (n) 



n 



n 



n 



n 



Now, for any 6 > 0, 



Moreover, 



So, 



E(r?l 7 ) < (1 + 5)2 7 log nE(y A ,n) + (1 + (T 1 ) 



-In 7 lo g n 



3n 



II V A || 



E(Va,„) < (1 + 5)Va,„ + (1 + r 1 )3 7 logn 



EfajL) < (1 + ,5) 2 2 7 lognyA,n + A(5) 



7log n 



n 



2 

oo' 



with A (5) a constant depending only on 5. Now, we apply (|4.7p with 



(4.9) 



/» - { A G I„ : $ > el ^log „ }> . 



so using LemmaHl we can claim that for any \ & m, F\ > Q y log J n ) , Finally, since > 1, 



El 



< K 3 \ ^ 0*1 

vAer n 



A£r„ 



Aer„ 



logn _ 2 /logn\ 2 



-^a + 



n 



K 4 



1 {/32>e24i°g^A>e v ^l + // 



< K* 



< 2K* 



12 \^ l {Pl<e%V^ n \o g n} + 21 °g^A,nl{ /3 2 >0 2 yA nlogn }) + £ 

A^r n 



Aer 



n 



£ min(/3ie2y Ajn logn)+ £ # 

A^r n 



AgT 



n 



where the constant depends on 7 and c and K4 depends on 7, c, c', ||/||i and on <P. Theorem [T] 
is proved by using properties of the biorthogonal wavelet basis. 



24 



P. Reynaud-Bouret and V. Rivoirard 



4.4 Proof of Theorem [2] 

Let us assume that / belongs to B^ r (R 1 - 2s )nW s (R)nh 1 (R)nh 2 (R). Inequality JTSJ of Theoremd] 
implies that, for all n, 



E(ll/n, 7 - /|?) < Cl 



Aer„ 



I/3aI<<xaV¥ 



A^r, 



re 



where Ci and C 2 are two constants. But we have 



Aer„ 



l/3Al>-A V /Ii r 



+oo 



fc=0 



Aer n 

+oo 

< E 2 ^'E^i 

fc=o aga 



fc+l n 



2- 4s f 2 *±i . /logn 



fc=0 



Is 



+oo 



< ^ 2 - 4s < s E 2 ~ 



fc+2s(fc+l) 



fc=0 



and 



So, 



Agr„ 



C 2 



fn, 7 - /111) < C( 7 , c, <?>, s)R 2 - 4s pl s + — 

re 



where C(7,c, <P, s) depends on 7, c, and s. Hence, 

E(||/ n , 7 - /II) < C( 7 , c, S )i? 2 - 4 V^ S (1 + On(l)) 
and / belongs to M5(/ 7 , p s )(R') for large enough. 

Conversely, let us suppose that / belongs to M5(/ 7 , p s )(R') H Li(i?') D L^i?'). Then, for any n, 



E(|/n, 7 -/H)<^ 



/2 / log n 



?i 



Consequently, there exists i? depending on R' and such that for any n, 



2* 



This implies that / belongs to -B| r (-R). 

Now, we want to prove that / G W S (R) if R is large enough. We have 



E^l 



AeA 



\0\\<v^ 



2n A^r„ Aer n 
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But X = P\l0 x[ > vx jT , so, 
So, for any n, 



< E^+E E K& - a) 2 ] + E ^\ M< ^j^ K (i m> ^) 

^ + ]Te[(/3 a -/3 a ) 2 ] + ^/? 2 p(<t a1 

Agr„ Aer n Aer„ V 



7log n ^7 
2n 2 



< E(||/3-/3| 2 2 )+ ^ /3 2 P U A 
Aer n V 



2n 2 



Using Lemma [H 



and 



\ 4s 

logn 



Since this is true for every n, we have for any t < 1, 

AeA 



2 ^ u 



7 

where i? is a constant large enough depending on R' and <P. Note that 

supt- 4 ^^^ < ||/3|| 2 2 _ 

*^ AeA 

We conclude that 

/ G 5| ir (i2) n 

for R large enough. 

4.5 Proof of Proposition [2] 

Since /? < |, G Li n L2. If the Haar basis is considered, the wavelet coefficients /3j t f. of fp can 
be calculated and we obtain for any j > 0, for any k G" {0, . . . , 2 J — l}, Pj^ = and for any j > 0, 
for any k G {0, . . . , Z> - l}, 

(3 3 , k = (1 - /3)-i2^(i-^) f 2 (* + i) - fc 1 "' 3 - (fc + I) 1 "' 3 ] 
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and there exists a constant < c\$ < oo only depending on (3 such that 

lim 2^-P)k l+ P(3 Jtk = c lfi . 



k— >oo 

Moreover the /3j,fc's are strictly positive. Consequently they can be upper and lower bounded, up 
to a constant, by 2~i(z~^ k~( 1+/3 \ Similarly, for any j > 0, for any k 6 {0, . . . , 2 J — l}, 

ol k = (l-(3)-^((k + l)^-k^) 
and there exists a constant < < 00 only depending on (3 such that 

lim 2 _j7 V<7? fc = C2,fl. 

fc^oo 

There exist two constants k((3) and only depending on (3 such that for any < t < 1, if 

and 

K(^r^2 3 '(^) > 2 J ' ^ 2 j < «'09)t"f. 
So, if 2^' < K'{(3)t-i , since /3 jfe = for k>2 j , 

fcez 

We obtain 

+00 2J-1 

E^iais^ £ oot e ^^"'w^-i E fc - 2 -^ £ cot* 1 *. 

AeA i=-i fc=i 

where C(/3) and C"(/?) denote two constants only depending on /3. So, for any < s < |, if we take 
[3 < 5(1 — 6s), then, for any < i < 1, £ 3 < t 4s . Finally, there exists c > 1, such that for any n, 

A^r n 

where R > 0. And in this case, 

//? ^ Loo , //? € B| )00 n W s := M5(/f , p a ). 

4.6 Proof of Theorem [3] 

Using the maxiset results of Section 13.14 since 



it is enough to show that 
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for R" > (see (ED). Let / 6 B% a (R) n A 2 «>(#)• We first prove that / £ for ij" 

large enough. Since for any A = (J, k), 

a\ < min (max(2^; l)^^- fc ; \\f\UMl) , 

where (p G {0, -0} according to the value of j, we have for any t > and any J 

2-p 



* ££^ 2 +££&(;g£) 



< maxdi^L ; ML)* 2 E max ( 2J ; !) E ^ + E ( vl/looM 



2-p 



i<J fc j>J 



A' 



where C(<P,R') is a constant only depending on # and on i?'. Indeed, we have used that 

J2 F i*<™M\\i> ( 4 - 10 ) 

k 

by similar arguments to (|4.2jl ). Now, since / belongs to Bp t00 (R) (that contains BpJR), see Section 
ED, with a + I - i > 0, 

E^Vl^ ^ C^a^B!) (2 J t 2 + t 2 -PRP2-^ a+ ^) , 

A 

where Ci(<P,a,p, R') depends on ^, a, p and i?'. With J such that 

2 J < RT^t'^i < 2 J+1 , 

/ 9 a 1 [^[<^* < C 2 (^,a,p,R')RT+^t^ 

x 

where Cii^, a,p, R') depends on ^, a, p and i?'. So, / belongs to W a for i?" large enough. 

Furthermore, using (|2.5p . if p < 2 and 

1 \ 1 1 
' ~ c(l + 2q) J ~p~2 



b^(r)cb;^(r). 

Finally, for i?" large enough, 



B« q (R) n £1,2,00(^0 c Bp j00 (R) n £ 1 , 2 , 00 ( J R / ) c B c 2 % 2a) (R") n 
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4.7 Proof of Theorem [4] 

In this subsection since a > and p > 2, we set 



a 



2a + 2-f 



Using the maxiset results of Section 13.14 since 

MSU^Ps) ■=■■ B^ s nw s , 

it is enough to show that 

B^ q (R) n U(R') n h 2 (R') c bQ(R") n W S (R") 

for R" > (see (|3.ip ). By using (|2.5p . since c > 1, we have 

B« ? (E)cB; ]CO (iJ)c< s (E). 

Let / G B° q (R) nLi(iJ') nL 2 (i?')- We P rove that / G W s (ii") for large enough. Using 
computations of Section 14.61 we have for any t > and any J > 

A \ j>J k ) 

where C(<P,R') is a constant only depending on <3> and on R' . Now, let us bound for all j > J 



Let us apply the Holder inequality. Since p > 2, we have 2 — -^y > and 



i 

p-i 



E 



2 ^ 

p-1 



Since / e ^ i0O (i?), 



Since / e Li(#), 



by using ()4.10p . Hence 



i 

p-i 



Ei^i 



22 / /(x)V'(2 i x - 



< 2i|^||ooE^ 

< 2^1^1100^1/1! 



£/^<^ 



oo 



, jR ') 2 -^2^ Q ^. 
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Finally, 

< Cx&R') (2 J t 2 + R^2- }a ^ 

x 

where Ci(<&,R') is a constant only depending on ^ and on R . With J such that 

J __P 2(P~1) Til 

2 < R a p+p~ 1 t "p+p-i < 2 , 



Z>A%[<«r A t ^ C 2 ($,a, P ,R')R a+1 -pr 

A 

where C2(^, i?') depends on ^, a, p and .R'. So, / belongs to W S {R") for i?" large enough. 
Finally, for R" large enough, 

B« q (R) n L!(i?') n L 2 (i?') c bQ(R') n w s (it!"). 

4.8 Proof of Theorem [5] 

To establish the lower bound stated in Theorem we first consider p > 2 and < a < r + 1. As 
usual, the lower bound of the risk 

K n (a,p) = M sup 

/ /GB^ 0O (i?)nLi(i?i)nL 2 (R 2 )nL 0O ( J R 0O ) L 

where R, R±, R2 and R^ are positive real numbers, can be obtained by using an adequate version 
of Fano's lemma based on the Kullback-Leibler divergence. We first give classical lemmas that 
introduce constants useful in the sequel. The first result recalls the Kullback-Leibler divergence for 
Poisson processes (see 0). 

Lemma 5. Let N and N' be two Poisson processes on R whose intensities with respect to the 
Lebesgue measure are respectively s and s' . We denote P (respectively Q) the probability measures 
associated with s (respectively with s 1 ). Then, the Kullback-Leibler divergence between P and Q is 



K(¥, Q) = J^ s (x)4> (log \ dx 



where <p{u) = exp(u) — u — 1. 

Now, let us give the following version of Fano's lemma, derived from 

Lemma 6. Let (Pi)ie{o,...,n} be a finite family of probability measures defined on the same measur- 
able space f2. One sets 



1 

K n = -Y / K(F i ,F ). 



n 
1=1 



Then, there exists an absolute constant B (B = 0.71 works) such that if 9 is a random variable on 
£1 with values in {0, ■~,n} ) one has 



inf Fi(6 = i)< max B, 



0<i<n " \ log(n + 1) 

Finally, we recall a combinatorial lemma due to Gallager (see Lemma 8 in (3o|). 
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Lemma 7. Let T be a finite set with cardinal Q. Let D < Q. There exist absolute constants 6 
and a such that there exists M.d C V(T), satisfying log|A^D| > &D if D = Q and \og\M-D\ > 
aD\og{Q / D) if D < Q and such that for all distinct sets m and m! belonging to Ain we have 
\m Am'| > 6D. 

Now, we are ready to provide a lower bound for 1Z n (a,p). For this purpose, for a given n large 
enough, we set j the largest integer such that 



2 J < 



R \ a+1 -p / Ri 



pa-\-p— 1 



2Bac 2 (<£) " 1 \ 2Bac 2 (#) " 1 c& 



a + l- 

n 



The constant C2(<£) was defined in Section [2.21 and is a constant depending only on ip such that 



We set for any £, 

. JO I" «(!-«)/ 

aw = — \ t — \ + l ]u+i]\ x >- 

Jo ex P (-5(i=5yJ du 

Note that 5 = \\ge\\i does not depend on £. We also introduce the integer D such that D2~ 3 is 
the largest integer satisfying 

D2 -o < Rin2 ~\ . - 25. (4.11) 

In particular, D2~' J goes to oo when n goes to oo. Using Lemma [7] with T = {0, 1, . . . , D — 1} and 
Q = D, we extract .Md for which both properties stated in Lemma [7] are satisfied and we set 



Cj,D = I fm = fj,D + a,j ^2 $j,k '■ m G Md > , 



with 

Bac 2 (^)- 1 cr2 



n 

The function is defined by 

fjM x ) = P 1 %D2-j]{x) + P9-i(x) + pg-m-i-i(.-x) 

where 

R^D' 1 

P 



1 + 2523 D- 1 ' 

Let / m G Observe that the support of X^fcemV'i.fc is included in [—1,D2~ 3 + 1] for n large 

enough. In this case, since p > 2aj2^c^ (see ()4.1ip ). we have for x in the support of J2kem $j,k 

fm(x) > P -. (4.12) 
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In addition for any x, f m (x) > 0. Now, we verify that f m belongs to Bp <00 (R) n Li(i?x) H h 2 (R 2 ) H 
Loo(-Roo)- We have: 

||/m||a,p,oo — \fj,D\oc,p,oc ^ ~] 1pj,k\ot,p,oo 



< \\fj,D\\a,p,oo + DPaj2 



— I fj,D ||a,p,oo 



( R in 2-j \ p Bac 2 ($)- 1 cj > 2 



\2Bac 2 ($)- 1 c 2 . 



f . nil + 2 , ' (a+1 ~* ) 



R 



n 



^2Bac 2 (#)-i C 2 



— \\fj,D\\a,p,oo 4" ^ ■ 



Finally, /^d has an infinite number of continuous derivatives bounded (up to constants) by p and 
||/i,D||a,p,oo is bounded (up to a constant) by p(D2~- ? ) 1 / p that goes to when n goes to do. So, for 
n large enough, 

||/rra|a,p,oo — R- 

Now, it remains to verify that f m S Li(i?i) n ^{Ri) H L 00 (i? 00 ). We have 



l/mloo < P + C^22 aj < RiVD- 1 + 
for n large enough. Using again (|4.1ip . 



Bac 2 (4>)^c^ 



< Ra 



n 



I/mil < 211/^11 + 2\\ aj ^ 4 fe ||| < 2p 2 ( J D2-J + 2<J) + 2c 2 {<P)Da) < 2pR x + RxB ^ < R\ 
for n large enough. Since f m > 0, 

II /m||l 



/ ( fjA x ) + a i S ^ fc ( x ) )dx = P D2 j + 25 p = R x . 



K n (a,p) >inf sup E \\f - ff 2 



Finally, we have: 

TZ n (a,p) > : 

If / is an estimator, we can define /' = argminjg^ D \\t — f\\ 2 . Then, for / G Cj t D, 

||/ / -/||2<||/ , -/||2 + ||/-/|| 2 <2||/-/|| 2 

and 

K n (a,p) > - ml sup e[||/-/||| 
Moreover if m and m' belong to Md with m ^ ml , 



||/m - /m'lll > ci(#)o?|mAm'| > cx{$)9Da] 
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where ci(#) has been defined in Section [2.21 Hence 



n n (a,p) > ^p-eDa* inf sup P/(/V /)• 
1 feC D feC D 



To apply Lemma El we need to compute K n . For any distinct sets m and m! belonging to Md, 
since for any x > —1, log(l + x) > x/(l + x) and by using (|4.12j) . we have 

K(F fm ,,F f J = [ f m ,<j>(log^)ndx 



[ [fm ~ fm' ~ fm' iogC 1 + ^ , ^ m )]ndx 

J fm' 



< / (/m fm,) \ x)ndx 
Jm 



< -n\\fm'-fm\\i (4.13) 

p 

2na 2 j Dc 2 ($) 



< 



P 



and K n < — 3 -r— - ■ By applying Lemma El since 



we have 



2c 2 (#)nDa? 



- ^ (1 - B) 2Bac 2 (^)-i^ (1 + <*(!)) 



i 



> CR a+1 -K Q+1 -p(l + o n (l)), 
where C is a constant that depends on q, p, C2(^), c^, 9, B, a and which is the stated result. 



For the case p < 2, by using computations similar to those of Theorem 2 of [la ] . it is easy to 

prove that the minimax risk associated to the set of functions supported by [0, 1] and belonging to 

_ 2a 

Bp q (R) for 0<a<r + lis larger than n 1 + 2a up to a constant. 

Finally, the adaptive properties of / 7 are proved by combining Theorems [3] and H] and the pre- 
vious lower bound. 

4.9 Proof of Theorem © 

Let us consider the Haar basis. For j > and D € {0, 1, . . . , 2 J }, we set 

Cj,D = {fm = pl[0,l] + 0>j,D ^2 fii' k '■ l m ' = D > 171 C ^ }' 

fcgm 
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where 

Mj = {k : ipj^ has support in [0, 1]}. 
The parameters j, D, p, aj t r> is chosen later to fulfill some requirements. Note that 

Nj = card (A/}) = 2 j . 

We know that there exists a subset of Cj,D, denoted Aij^D, and some universal constants, denoted 
9 and <r, such that for all m,m' £ -Mj,D, 

card(mAm') > 9D, log (card(_Mj 5 D)) > crDlog (^J^j 

(see Lemma [7]). Now, let us describe all the requirements necessary to obtain the lower bound of 
the risk. 

• To ensure f m > and the equivalence between the Kullback distance and the L2-norm (see 
below), the / m 's have to be larger than p/2. Since the (f>j : kS have disjoint support, this means 
that 

p> 2 1+j / 2 \ aj , D \. (4.14) 

• We need the / m 's to be in L x (i?") n h^R"). Since ||/||i = p and 1/ = p + 2^ 2 \a jtD \, we 
need 

p + 2 j/2 \a^ D \ < R". (4.15) 

• The / m 's have to belong to i?| r (i?') i.e. 

p + 2 js VD\a jiD \ < R'. (4.16) 

• The fm's have to belong to W S (R). We have a\ = p. Hence for any t > 

P 2l P<Vpt + - Da i,s 1 l%,cl< v / P* - R2 ~ 4st4s - 
If \ a j,D\ < P, then it is enough to have 

p 2 + Da\ D < R 2 - 4s p 2s (4.17) 

and 

< i? 2 - 4s (^f) ■ (4.18) 
If the parameters satisfy these equations, then 

iz(w s (R) n bi t {R) n £i, 2 ,oo(i?")) > n{M jlD ), 

where 72-(W s (i2)nS| r (i? / )n£i i 2,oo(-R")) an d T^(-M-j,D) are respectively the minimax risks associated 
with W s (i?) Pi -B| r (i2') H Ci t 2,oo(R") and .M^d- By similar arguments to those of the proof of 
Theorem [5l one obtains 

K(M jtD ) > \eDa 2 jiD . inf (1 - inf P(/ = /)). 
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We now use Lemma [H Recall that (see (|4,13p ) 



2 



K(F fu ,¥ f J<- p nDal D . 

Hence 



4 

as soon as the mean Kullback Leibler distance is small enough, which is implied by 



n{M hD ) > ^-—^Da\ D 



-nDa 2 jD < BaDlog(2 j /D). (4.19) 

p ji 

Let us take j such that 2 J < n/logn < 2 J+1 and with D < 2 J , 

<n = ^o g {V/D). 

First note that (|4,19p is automatically fulfilled as soon as p < 2Ba, that is true if p an absolute 
constant small enough. Then 



P + 2^|a, D |<p + 2^y^p<1.5p. 
So, if p is an absolute constant small enough, (|4.15j) is satisfied. Moreover 



'n '^n— >oo 



2^/*\a, D \<2^/\^<p. 

V 4n 

This gives (|4.14p . Now, take an integer D = D n such that 

l-2s 

v log n J 

For n large enough, D n < 2 J and D n is feasible. We have for R fixed, 

2 n 2 1 °g» 

a j,L> n ~rwoo O s /) — — , 

where C s is a constant only depending on s. Therefore, 

>l-2s 



p + 2 3 WD n \a j , Dn \ =P+ VCspR'- 23 + o n (l). 

Since R}~ 2s < R' it is sufficient to take p small enough but constant depending only on s to obtain 
P~T6j) . Moreover, 



n ^ 2 r< ^2 p2-4s 



log n 



n 



2.s 



Hence (|4.17p is equivalent to p 2 < R? 4s p 2s . Since R > 1, this is true as soon as p < 1. Finally 
(|4.18p is equivalent, when n tends to +oo, to 



CsP 1 < (C sP ) 



2.s 
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Once again this is true for p small enough depending on s. As we can choose p not depending on 
R,R',R", this concludes the proof. 

Corollary [T]is completely straightforward once we notice that if R' > R then for every s, R' > R 2 ~ 4s . 
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